98 research outputs found

    Final Research Report for Sound Design and Audio Player

    Get PDF
    This deliverable describes the work on Task 4.3 Algorithms for sound design and feature developments for audio player. The audio player runs on the in-store player (ISP) and takes care of rendering the music playlists via beat-synchronous automatic DJ mixing, taking advantage of the rich musical content description extracted in T4.2 (beat markers, structural segmentation into intro and outro, musical and sound content classification). The deliverable covers prototypes and final results on: (1) automatic beat-synchronous mixing by beat alignment and time stretching – we developed an algorithm for beat alignment and scheduling of time-stretched tracks; (2) compensation of play duration changes introduced by time stretching – in order to make the playlist generator independent of beat mixing, we chose to readjust the tempo of played tracks such that their stretched duration is the same as their original duration; (3) prospective research on the extraction of data from DJ mixes – to alleviate the lack of extensive ground truth databases of DJ mixing practices, we propose steps towards extracting this data from existing mixes by alignment and unmixing of the tracks in a mix. We also show how these methods can be evaluated even without labelled test data, and propose an open dataset for further research; (4) a description of the software player module, a GUI-less application to run on the ISP that performs streaming of tracks from disk and beat-synchronous mixing. The estimation of cue points where tracks should cross-fade is now described in D4.7 Final Research Report on Auto-Tagging of Music.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D

    Surfing the Waves: Live Audio Mosaicing of an Electric Bass Performance as a Corpus Browsing Interface

    Get PDF
    In this paper, the authors describe how they use an electric bass as a subtle, expressive and intuitive interface to browse the rich sample bank available to most laptop owners. This is achieved by audio mosaicing of the live bass performance audio, through corpus-based concatenative synthesis (CBCS) techniques, allowing a mapping of the multi-dimensional expressivity of the performance onto foreign audio material, thus recycling the virtuosity acquired on the electric instrument with a trivial learning curve. This design hypothesis is contextualised and assessed within the Sandbox#n series of bass+laptop meta-instruments, and the authors describe technical means of the implementation through the use of the open-source CataRT CBCS system adapted for live mosaicing. They also discuss their encouraging early results and provide a list of further explorations to be made with that rich new interface

    Rich Contacts: Corpus-Based Convolution of Audio Contact Gestures for Enhanced Musical Expression

    Get PDF
    We propose ways of enriching the timbral potential of gestural sonic material captured via piezo or contact microphones, through latency-free convolution of the microphone signal with grains from a sound corpus. This creates a new way to combine the sonic richness of large sound corpora, easily accessible via navigation through a timbral descriptor space, with the intuitive gestural interaction with a surface, captured by any contact microphone. We use convolution to excite the grains from the corpus via the microphone input, capturing the contact interaction sounds, which allows articulation of the corpus by hitting, scratching, or strumming a surface with various parts of the hands or objects. We also show how changes of grains have to be carefully handled, how one can smoothly interpolate between neighbouring grains, and finally evaluate the system against previous attempts

    Methods and Datasets for DJ-Mix Reverse Engineering

    Get PDF
    International audienceDJ techniques are an important part of popular music culture. However, they are also not sufficiently investigated by researchers due to the lack of annotated datasets of DJ mixes. Thus, this paper aims at filling this gap by introducing novel methods to automatically deconstruct and annotate recorded mixes for which the constituent tracks are known. A rough alignment first estimates where in the mix each track starts, and which time-stretching factor was applied. Second, a sample-precise alignment is applied to determine the exact offset of each track in the mix. Third, we propose a new method to estimate the cue points and the fade curves which operates in the time-frequency domain to increase its robustness to interference with other tracks. The proposed methods are finally evaluated on our new publicly available DJ-mix dataset. This dataset contains automatically generated beat-synchronous mixes based on freely available music tracks, and the ground truth about the placement of tracks in a mix

    Stato dell’arte nella sintesi di texture sonore

    Get PDF
    The synthesis of sound textures, such as rain, wind, or crowds, is an important application for cinema, multimedia creation, games and installations. However, despite the clearly defined requirements of naturalness and flexibility, no automatic method has yet found widespread use. After clarifying the definition, terminology, and usages of sound texture synthesis, we will give an overview of the many existing methods and approaches, and the few available software implementations, and classify them by the synthesis model they are based on, such as subtractive or additive synthesis, granular synthesis, corpus-based concatenative synthesis, wavelets, or physical modeling. Additionally, an overview is given over analysis methods used for sound texture synthesis, such as segmentation, statistical modeling, timbral analysis, and modeling of transitions

    Improving polyphonic and poly-instrumental music to score alignment

    Get PDF
    6ppInternational audienceMusic alignment links events in a score and points on the audio performance time axis. All the parts of a recording can be thus indexed according to score information. The automatic alignment presented in this paper is based on a dynamic time warping method. Local distances are computed using the signal's spectral features through an attack plus sustain note modeling. The method is applied to mixtures of harmonic sustained instruments, excluding percussion for the moment. Good alignment has been obtained for polyphony of up to five instruments. The method is robust for difficulties such as trills, vibratos and fast sequences. It provides an accurate indicator giving position of score interpretation errors and extra or forgotten notes. Implementation optimizations allow aligning long sound files in a relatively short time. Evaluation results have been obtained on piano jazz recordings

    Interactive Sound Texture Synthesis through Semi-Automatic User Annotations

    Get PDF
    We present a way to make environmental recordings controllable again by the use of continuous annotations of the high-level semantic parameter one wishes to control, e.g. wind strength or crowd excitation level. A partial annotation can be propagated to cover the entire recording via cross-modal analysis between gesture and sound by canonical time warping (CTW). The annotations serve as a descriptor for lookup in corpus-based concatenative synthesis in order to invert the sound/annotation relationship. The workflow has been evaluated by a preliminary subject test and results on canonical correlation analysis (CCA) show high consistency between annotations and a small set of audio descriptors being well correlated with them. An experiment of the propagation of annotations shows the superior performance of CTW over CCA with as little as 20 s of annotated material

    Final Research Report on Auto-Tagging of Music

    Get PDF
    The deliverable D4.7 concerns the work achieved by IRCAM until M36 for the “auto-tagging of music”. The deliverable is a research report. The software libraries resulting from the research have been integrated into Fincons/HearDis! Music Library Manager or are used by TU Berlin. The final software libraries are described in D4.5. The research work on auto-tagging has concentrated on four aspects: 1) Further improving IRCAM’s machine-learning system ircamclass. This has been done by developing the new MASSS audio features, including audio augmentation and audio segmentation into ircamclass. The system has then been applied to train HearDis! “soft” features (Vocals-1, Vocals-2, Pop-Appeal, Intensity, Instrumentation, Timbre, Genre, Style). This is described in Part 3. 2) Developing two sets of “hard” features (i.e. related to musical or musicological concepts) as specified by HearDis! (for integration into Fincons/HearDis! Music Library Manager) and TU Berlin (as input for the prediction model of the GMBI attributes). Such features are either derived from previously estimated higher-level concepts (such as structure, key or succession of chords) or by developing new signal processing algorithm (such as HPSS) or main melody estimation. This is described in Part 4. 3) Developing audio features to characterize the audio quality of a music track. The goal is to describe the quality of the audio independently of its apparent encoding. This is then used to estimate audio degradation or music decade. This is to be used to ensure that playlists contain tracks with similar audio quality. This is described in Part 5. 4) Developing innovative algorithms to extract specific audio features to improve music mixes. So far, innovative techniques (based on various Blind Audio Source Separation algorithms and Convolutional Neural Network) have been developed for singing voice separation, singing voice segmentation, music structure boundaries estimation, and DJ cue-region estimation. This is described in Part 6.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D

    Sensus Communis: Some Perspectives on the Origins of Non-synchronous Cross-Sensory Associations

    Get PDF
    Adults readily make associations between stimuli perceived consecutively through different sense modalities, such as shapes and sounds. Researchers have only recently begun to investigate such correspondences in infants but only a handful of studies have focused on infants less than a year old. Are infants able to make cross-sensory correspondences from birth? Do certain correspondences require extensive real-world experience? Some studies have shown that newborns are able to match stimuli perceived in different sense modalities. Yet, the origins and mechanisms underlying these abilities are unclear. The present paper explores these questions and reviews some hypotheses on the emergence and early development of cross-sensory associations and their possible links with language development. Indeed, if infants can perceive cross-sensory correspondences between events that share certain features but are not strictly contingent or co-located, one may posit that they are using a “sixth sense” in Aristotle’s sense of the term. And a likely candidate for explaining this mechanism, as Aristotle suggested, is movement
    • …
    corecore